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Abstract — Performance of data forwarding in Delay Tolerant 
Networks (DTNs) benefits considerably if one can make use 
of human mobility in terms of social structures. However, it 
is difficult and time-consuming to calculate the centrality and 
similarity of nodes by using solutions for traditional social 
networks, this is mainly because of the transient node contact and 
the intermittently connected environment. In this work, we are 
interested in the following question: Can we explore some other 
stable social attributes to quantify the centrality and similarity 
of nodes? Taking GPS traces of human walks from the real 
world, we find that there exist two known phenomena. One 
is public hotspot, the other is personal hotspot. Motivated by 
this observation, we present Hoten (hotspot and entropy), a 
novel routing metric to improve routing performance in DTNs. 
First, we use the relative entropy between the public hotspots 
and the personal hotspots to compute the centrality of nodes. 
Then we utilize the inverse symmetrized entropy of the personal 
hotspots between two nodes to compute the similarity between 
them. Third, we exploit the entropy of personal hotspots of a 
node to estimate its personality. Besides, we propose a method 
to ascertain the optimized size of hotspot. Finally, we compare 
our routing strategy with other state-of-the-art routing schemes 
through extensive trace-driven simulations, the results show that 
Hoten largely outperforms other solutions, especially in terms of 
combined overhead/packet delivery ratio and the average number 
of hops per message. 

I. Introduction 

Delay tolerant networks [1| have been applied into many 
applications, such as the interplanetary internet 0, vehicle 
ad-hoc networks [3| and content delivery system ||4] etc. 
In these scenarios, routing is one of the most challenging 
problems, due to the lack of an end-to-end path between 
source and destination. Obviously, this new feature leads 
to a considerable performance degradation for conventional 
wireless routing protocols such as AODV or DSR, as they are 
originally designed for stable network topology. Hence, new 
data forwarding algorithms should be designed for DTNs. 

In the past few years, several DTNs routing schemes (e.g., 
epidemic [6| and data MULEs [7]) have been proposed to deal 
with this problem. Among them, epidemic scheme seems to 
be a feasible solution to forward messages from a sender to a 
potential receiver when nothing is known about the mobility 
of nodes (in the rest of this paper, without loss of generality, 
we use the terms "people" and "node" interchangeably), since 
it tries to send each message over all possible paths in the 
network. Apparently, epidemic scheme has the merits of low 
mean delivery delay (MDD) and high packet delivery ratio 



(PDR), at the same time, it also incurs a high price of system 
resources because of the large amount of redundant copies. 

This deficiency has motivated researchers to develop other 
novel data forwarding algorithms, which make a better tradeoff 
between packet delivery ratio and the consumption of system 
resources by taking advantage of different contexts (e.g., J8J 
|9| 1 10] | lT| lfl2l ). For these schemes, the routing performance 
depends heavily on the contexts they used to estimate the 
better relay nodes to the destination. Furthermore, most ex- 
isting schemes do not take the social structures into account. 
Whereas, human walks gradually play a critical role in the 
network performance ifPJl with the recent popularization of 
personal hand-held mobile devices, since devices may lose 
connection when people move around. Hence, the social struc- 
tures of humans walks acquired by mobility characterization 
techniques are of great importance on designing data forward- 
ing metrics. Therefore, we focus on how to integrate social 
structures into the data forwarding algorithms in DTNs. It is a 
critical while challenging task especially in an intermittently 
connected environment. 

Recently, there are a few works that explicitly consider some 
social structures in DTNs routing (e.g., lfj"4l ITT31 ). However, 
none of them fully exploit social structures extracted from 
real human traces. For instance, some existing schemes only 
exploit virtual community structure to identify the friendship 
among nodes and use centrality or similarity of nodes to 
estimate the utilities of such nodes as potential relays. The 
reason behind these schemes is that the underlying social 
structure is more stable compared with the network topology, 
and hence can be used for better relay selections. By analyzing 
GPS traces of human walks from the real world, we confirm 
that there also exist two known phenomena as the indications 
in lfl6l ifTTIl . One is that people always move around a set of 
well popular locations which are called public hotspots, instead 
of purely random motions. The other is that each people shows 
preference for some particular locations which are called 
personal hotspots in this paper. We believe that both kinds 
of hotspots are more stable than underlying social structure 
mentioned above as public hotspots are formed by superim- 
posing personal hotspots together and personal hotspots/habits 
are stable over time and across situations [33 1 . Moreover, the 
evaluation for centrality and similarity of nodes in existing 
schemes takes traditional methods of social networks or ego 
networks. We argue that these approaches are difficult and 
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time-consuming, due to the transient node contact and the 
intermittently connected environment. 

Taking all above issues into account, in this paper, we 
exploit hotspots to design a new routing metric. In specific, we 
investigate the following two kinds of hotspots. (i) The public 
hotspots: this implies that there exists a bigger chance to meet 
the destination in these locations than other places. Hence, we 
also need to address how to identify those nodes which have 
a higher centrality than others, (ii) The personal hotspots: this 
implies that if we can deliver a message to one of the most k 
popular personal hotspots of the destination, the message will 
be quickly received by the destination. As such, we have to 
answer the problem of how to estimate the similarity between a 
potential relay and the destination. Besides, since each person 
has his/her own personality we still need to incorporate this 
factor into the data forwarding process. 

In this paper, we present a novel metric, called Hoten, to 
address these challenges. We first use the relative entropy 
between the public hotspots and the personal hotspots to 
evaluate the centrality of nodes. Then we utilize the inverse 
symmetrized entropy of the personal hotspots of two nodes 
to weigh the similarity between them. Third, different from 
the related works, we integrate a new factor, personality, into 
our Hoten metric and exploit the entropy of personal hotspots 
to estimate node personality. Besides, we propose a method to 
ascertain the optimized size of hotspot. Our main contributions 
can be summarized as follows: 

• We introduce the entropy theory into opportunistic for- 
warding. Rather than exchange neighbor's adjacency ma- 
trix [14] or count the number of times a node acts 
as a relay for other nodes on all the shortest delay 
paths ifTSl , we exploit hotspot and entropy to quantify 
the centrality and similarity of nodes, which guarantees 
Hoten is concise and low time complexity. 

• We take personality of nodes into account, which makes 
Hoten prediction more accurate than the existing works 
since each people has his/her own personal habit. 

• We exploit the values of Hurst parameter to explore the 
optimized size of hotspot and try to reduce the influence 
of the number of hotspots on the bursty dispersion of 
traces. 

• We conduct extensive experiments to compare Hoten and 
several state-of-the-art works based on five real DTNs 
traces, experiment results show that Hoten largely out- 
performs other solutions, especially in terms of combined 
overhead/packet delivery ratio and the average number of 
hops per message. 

We organize the remainder of this paper as follows. Section 
II reviews the related work. Section III presents the process 
for identifying hotspots. Section IV describes our approaches 
to evaluate centrality, similarity and personality metrics. In 
Section V, we make a performance evaluation. Finally, we 
conclude our paper and discuss some future research issues in 
Section VI. 



II. RELATED WORK 

It is challenging to deliver messages through disconnected 
parts of the network. In the past, several schemes have been 
proposed to solve this issue. On the basis of contexts they 
used, we classify them into the following two categories: (i) 
data forwarding without social structures, (ii) data forwarding 
with social structures. 

A. Data Forwarding without Social Structures 

Periodic information based: Several schemes utilize the 
periodic information inherent to some mobility patterns to 
route message in DTNs. S. Merugu et al. [18| assumed that 
the global knowledge of the mobility of nodes could be 
predicted over a finite or indefinite time scale, due to the 
periodicity in node movement. They delivered messages over 
a space-time routing table with knowledge of when the relay 
would be encountered. Likewise, S. Jain et al. [19] took 
a modified Dijkstra algorithm to compute the shortest path 
between the source and the destination by assuming that the 
time when a message will arrive at a particular node must 
be predicted. They presented several schemes and evaluated 
their performance based on different knowledge oracles which 
they acquired from the network. On exploiting past traces of 
buses to predict future behavior, the authors of [20] presented 
MaxProp, which shows better performance than protocols that 
depend on proactive knowledge. Besides, the authors of [21 1 
proposed a source routing in DTNs, they took the expected 
minimum delay as forwarding metric based on that the motion 
of real objects was repetitive but non-deterministic. 
Opportunity based: The deficiency of epidemic scheme 
has motivated researchers to develop opportunity based data 
forwarding algorithms (e.g, i] (3 lHU] flj] E] ). Most of 
them make a better tradeoff between packet delivery ratio and 
the consumption of system resources by taking advantage of 
different contexts. For these schemes, the routing performance 
depends heavily on the contexts they used to estimate the better 
relay nodes to the destination. For instance, A. Lindgren et 
al. J8] presented PROPHET, a probabilistic routing protocol 
for DTNs. They exploited past histories of encounters to 
predict the probability of future encounters. Similar to El, 
CAR (context aware routing) was proposed in J9], which 
exploited Kalman filters and the context information such 
as the changing rate of neighbors of a node and its current 
energy level to predict the delivery probability. In addition, J. 
Leguay et al. iflOll presented MobySpace, a high-dimensional 
Euclidean space constructed by the past motion patterns of 
nodes. 

B. Data Forwarding with Social Structures 

Note that most aforementioned schemes do not take the 
social structures into account. However, with the recent popu- 
larization of personal hand-held mobile devices, human walks 
gradually play a critical role in the network performance, since 
devices may fail to connect with each other when people 
move around. Recently, a few works attempt to uncover the 
underlying stable network structure in real traces by using 
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social networks analysis technology [22 1 . For example, SimBet 
lfl4l exploited betweenness centrality and social similarity of 
ego networks [23 1 to differentiate nodes. Messages will be 
forwarded to such nodes which have relatively big SimBet 
values to increase the probability of finding better relays to 
the final destination. P. Hui et al. lfl5ll proposed BUBBLE, 
which combined node centrality and community structure to 
make forwarding decisions. They assumed that each node had 
a global rank across the whole system and a local rank within 
its local community. When a message is out of the community 
of the destination, it is forwarded to the node with a high 
global rank, when the message enters into the range of the 
destination community, it is delivered to the node with a high 
local rank in that community. 

III. Identifying hotspot 

We present the experimental datasets used in the paper in 
Section III. A. In Section III.B, we give a detailed presentation 
about the hotspots division and weight computation. In Section 
III.C, we discuss the bursty dispersion of hotspots. 

A. Experimental Data-sets 

We use the following five real DTNs data-sets gathered 
by El ED over almost two years (from 2006-08-26 to 
2008-04-18), referred to as KAIST, NCSU, New York City, 
Orlando and North Carolina State Fair. The characteristics of 
these datasets such as intra/inter-contact distribution have been 
explored in several studies (e.g., ffT~7l 11241 ) and applied into 
different scenarios (e.g., message deletion mechanism in 112511 ). 
Interestingly, by analyzing these traces, we find that they cover 
a rich diversity of environments ranging from well connected 
area (State fair) to quite sparse situation (New York City). We 
summarize the main features of the five data-sets in Table [Q 



TABLE I 

Statistics of collected real traces from five sites 



Site 


No. of traces 


volunteers 


start date 


end date 


KAIST 


92 


34 


2006-09-26 


2007-10-03 


NCSU 


35 


20 


2006-08-26 


2006-11-16 


New York 


39 


10 


2006-10-23 


2008-04-18 


Orlando 


41 


18 


2006-11-19 


2008-01-09 


State fair 


19 


18 


2006-10-24 


2007-10-21 



B. Hotspots Division and Weight Computation 

In this subsection, we first clarify some terms used in this 
paper such as GPS log, GPS trace, stay point, hotspot and 
then present our solutions to hotspots division and weight 
computation. 

GPS log and GPS trace: The data collected by the GPS 
devices carried by participants are form of GPS log, which 
is a sequence of three-tuples (Timestamp, X-coordinate, Y- 
coordinate). As depicted in FigQ] on a two dimensional plane, 
we can connect these three-tuples into a GPS trace according 
to their time sequences. 

Stay point: A stay point P denotes a physical location where 
a participant stays more than a threshold. There are two 
categories of stay points. The first means that a participant 




Fig. 1, GPS trace and stay point 



remains stationary for a while exceeding the threshold, and 
the second denotes that a person wanders around within a 
certain small spatial region for a time period (in [17 1, the 
default values of the threshold and the radius of small region 
are 30 seconds and 5 meters, respectively). The authors of ll26ll 
proposed an algorithm for stay point detection. 
Hotspot division: Different from the virtual community struc- 
ture used in lfT31 . a hotspot is defined to be a physical region 
with an area of d by d. Since different values of d result in 
different number of hotspots, which in turn influences the self- 
similarity of traces [16|, we need to find the optimized value 
of d. In this paper, we exploit the values of Hurst parameter to 
explore the influence of the number of hotspots on the bursty 
dispersion of traces. We take the maximum of Hurst parameter 
to ascertain the optimized size of hotspots. Mathematically, let 
D denote the set of d, let H denote the Hurst parameter of 
traces and function / denote the mapping from D to H, we 
have 

f : D -> H 

and there exists doptimized G A such that h max = max(/), 
where h max € H. We use iterative process to observe the 
effect of selecting different values of d on the values of h, 
which are estimated by using the aggregated variance method. 
Fig. 2 shows the results. Among them, we set d op ti m ized = 
/ '(''max), where / _1 is the inverse function of /. 
Weight computation: As stated above, we divide the five 
scenarios by non-overlapping d by d squares, each square 
indicates one hotspot. We use the weight of each hotspot to 
denote its popularity. The larger the weight value is, the more 
popular the hotspot is. There are several methods to estimate 
the weight of hotspots 11271 . we here take a simple but efficient 
solution, called count process. We count the number of stay 
points within each hotspot and then compute the weight of 
each hotspot by normalizing the sampled count. 

Let K denote the number of hotspots in the network, let nt 
denote the number of stay points in hotspot i and w, denote 
the weight of that hotspot, we have: 



Similarly, let ni , denote the number of the ith 
person's stay points in jth hotspot and w 3 personal . denote the 
weight of that hotspot influenced by the ith person, we have: 

j 

j ^personal; 

w , — : — (2) 

personaLi j( 

personali 
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(d) Orlando 
Fig. 2. The distribution of public 



An online approach to identify public hotspots: Clearly, it 
is not possible for a DTNs user to acquire a global knowledge 
(e.g., the public hotspots). We here exploit the aggregated 
personal hotspots to identify the public hotspots of system. 
That is, each node carries a hotspot matrix HnxK with initial 
elements hi ,- = w J , and otherwise (where we take 
node i as an example and N is the number of nodes). When 
two nodes meet up, they exchange their own H and update 
the values of H using information from their neighbor. After 
that, they estimate the weight uij of the jth public hotspot by 
summing the elements in Vj, where Vj is the jth column of 
H . Finally, they normalize each Wj and use them to compute 
the betweenness centrality (please refer to Section IV.D). 

C. Bursty Dispersion of Hotspots 

The phenomenon of bursty dispersion (i.e., self-similarity) 
of hotspot implies that people always tend to swarm near to a 
few very popular locations, which means we can only use few 
particular locations to identify the individual trace. Hence, the 
size of control packet will be reduced considerably. 
Bursty dispersion of public hotspots: The bursty of public 
hotspots means that popular locations become more popular 
as individual bursty traces are superimposed together. Fig 12 
portrays the distribution of public hotspots of the five sites, 
which shows a clearly bursty pattern and coincides with the 
theory of preferential attachment proposed in ll28l . 
Bursty dispersion of personal hotspots: The bursty of 
personal hotspots implies that individual user spends most time 
in some special locations consciously or unconsciously. On 
average, only about of 1-14.7% hotspots are visited by each 
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(e) State fair 
hotspot weight for the five scenarios 

participant as shown in Table [IT] (we here only consider the 
top k hotspots whose sum of weights is bigger than or equal 
to 0.9, which has at least 90% confidence guarantee). Fig. 4 
depicts the distribution of personal hotspots of State fair (in 
each scenario, we randomly choose two people as samples. 
The rest scenarios show the similar features, we omit them 
here due to the space limitation), which also shows a bursty 
pattern as that of the public hotspots. Notice that there exist 
two phenomena in these figures (Figffjand Fig. 4). One is that 
different people may have different preferred locations, i.e., 
different personal habits, the second is that the bursty degree 
of personal hotspots is fiercer than that of the public hotspots. 
Both the two phenomena inspire us to estimate the centrality, 
personality and similarity of people. 



TABLE II 

The average ratio of visited hotspots in each trace 



KAIST 


NCSU 


New York 


Orlando 


State fair 


0.01 


0.018 


0.07 


0.01 


0.147 



IV. Implementing hotspots into Hoten 

We present our solution in this section. In Section IV.A, 
we explore the centrality of a node. We analyze the similarity 
between nodes in Section IV.B. In Section IV.C, we present 
personality. We finally exploit entropy and hotspot to design 
Hoten metric in Section IV.D. 

A. Centrality 

Node centrality reflects the relative importance of nodes in 
the network (i.e., how popular a person is within a social 



network). The more important the person is, the bigger the 
chance to meet other people is. Freeman [29 1 [ 30 1 proposed 
three most widely used methods to estimate centrality, called 
degree, closeness and betweenness measures. 
Degree centrality: Degree centrality is measured as the num- 
ber of one-hop neighbors of a given node i, which reflects 
the direct relationship between the node i and its neighbors. 
A node with higher degree centrality means it can directly 
contact with more other nodes. Degree centrality of node i is 
counted as: 

N 

c l D = J2 p v (3) 

where N is the number of nodes in the network and pij = 1 
if node j is one of neighbors of node i. 

It is not easy to compute degree centrality in DTNs as the 
number of direct contacts that involve a node is varying from 
time to time. One optional method is that we can set a time 
window and count the number of neighbors of nodes within 
it. However, we can not ascertain how the optimal size of the 
time window is. 

Closeness centrality: Closeness centrality shows the "close- 
ness" of a node to all other reachable nodes. Freeman took 
the reciprocal of the average geodesic length d(i,j) (i.e., the 
shortest path from node i to all other reachable nodes) to 
measure it l30l . Closeness centrality of a node also reflects 
the node's freedom from the network, which is calculated as: 



C, 



N - 1 



c 



N 

E d(i,j) 



(4) 



In DTNs, it is hard to work out the geodesic length d(i,j), 
due to the unguaranteed end-to-end path between node i and 
node j. 

Betweenness centrality: Betweenness centrality reflects the 
controlling capability of a node to other nodes, which mea- 
sures the extent to which a node falls on the shortest path be- 
tween two other nodes. The higher the betweenness centrality 
of a node is, the bigger the ability it has to facilitate com- 
munication to other nodes within the network is. Betweenness 
centrality of a node i is computed as : 



N N 

v^v^ 9jk{i) 



(5) 



where is the total number of shortest path between node 
j and node k, and gjk{i) is the number of those paths that 
include node i. 

Obviously, the betweenness centrality is difficult to be 
evaluated with the increasing number of nodes, due to the 
high time complexity. Besides, similar to that of closeness 
centrality, it is more difficult to work it out in DTNs. For 
example, the authors of [ 14] used an adjacency matrix A to 
represent node contacts, which has elements Aij = 1 if there 
has been at least one contact between node i and j at any past 



(a) t=3()0s 



(b) t=3000s 



Fig. 3. Adjacency matrixes for node and 1 at different time instants. Black 
— > contacts for A\, dark gray — > contacts for both A\ and Ao, light gray — > 
contacts for Ao, white — > no contact. Due to the space limitation, we here 
take the data-set State Fair as a sample. 



time and Ay = otherwise. The betweenness centrality thus 
can be estimated as: 



(6) 



Apparently, the matrixes will get more and more identical 
with the contacts aggregation as shown in Figf3] (when two 
nodes meet up each other, they swap their own neighbor list 
to update the matrix). As a consequence, heterogeneity of the 
nodes can not be well reflected, which in turn will impair the 
network performance (please refer to the Section V). On the 
other hand, if we use the sliding time window as the authors of 
ITT51 did, we have to ascertain the optimal size of time window, 
whereas, answering this question is non-trivial as well. We 
discuss how to exploit hotspot to solve this problem in the 
Section IV.D. 

B. Similarity 

Similarity reflects the associations between nodes in the 
network. Sociologists have observed the phenomenon long 
before, which is called "clustering" in physics, that if two 
people have one or more common friends, they can also be 
friends with high probability. 

The number of common neighbors between nodes has an 
important influence on the dissemination speed of messages 
in DTNs. When the neighbors of nodes contact each other 
frequently, the message diffusion process can be expected 
to take faster than when the association between nodes is 
weaker. That is, nodes having a stronger association with a 
given node are good relay candidates for message diffusion to 
that node. The generalized method exploits some contexts to 
estimate the degree of association. For example, the authors of 
II3TI took advantage of the mail list to match the relationship 
between people in real world. The authors of 1132] reflected the 
associations between bloggers by analyzing the linking objects 
existing in the large number of blogs. 

However, it is difficult to count the number of common 
neighbors (or others such as the common mail list items [31] 
or common linking objects ll32l ). due to the same reasons 
mentioned in the subsection IVA. 
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C. Personality 

Personality reflects the unique characteristic (or behavior) of 
a given person. The famous psychologist Airport, G. W ||33l 
defined the personality as "a general neuropsychic structure 
unique to the individual with the capacity to render many stim- 
uli functionally equivalent and to instigate and guide consistent 
(equivalent) forms of adaptive and stylistic behavior." Airport, 
G. W suggested that personality characteristics are relatively 
stable over time and are stable across situations. 

The personality characteristics mainly include tendentious- 
ness, complexity, uniqueness, positiveness and stability etc. We 
believe that the personal hotspots at least can reflect the ten- 
dentiousness, uniqueness and stability of personality as shown 
in Fig |2] and Fig. 4, since public hotspots are superimposed 
by personal hotspots, and moreover, each people has his/her 
own personal habit and the personal habit is stable once it 
is forming. Hence, it is necessary and significative to exploit 
personal hotspots to make comparisons across people. In the 
next subsection, we introduce how to integrate it into the Hoten 
metric. 

D. Hoten 

In this subsection, we use the entropy theory to compute 
betweenness centrality, similarity and personality of nodes as 
it denotes the degree of disorder or randomness in a system, 
that is, the bigger the entropy value is, the more disordered 
the system would be. More specifically, we utilize the relative 
entropy between the public hotspots and the personal hotspots 
to evaluate betweenness centrality of a node, we then exploit 
the inverse symmetrized entropy of the personal hotspots 
between two nodes to compute the similarity between them, 
we finally use the entropy of personal hotspots of a node to 
estimate its personality. 

Let random variable Xi denote the distribution of personal 
hotspots of node i, let random variable Y denote the distribu- 
tion of public hotspots, let p(x\ ) and p(yj) denote the weights 
of jth personal hotspot and public hotspot, respectively, we 
have: 

p{x 3 i ) = wl ersona , (7) 



P(Vj) 



(8) 



Betweenness centrality computation: Relative entropy (also 
called Kullback-Leibler divergence) can be used to differen- 
tiate the divergence between two random distributions. If the 
relative entropy value equals to zero, we call that the two 
random variables have the same distribution (i.e., if Xi has 
the same distribution as Y, we call that node i has the highest 
betweenness centrality in the network). Let C£ denote the 
betweenness centrality of node i, we have 

K 
3=1 

Replace equations (7) and (8) into equation (9), we have 



CI 



K 

\ / j personal 
3=1 



(\ w 3 , \og(w J 

^ / j versonali oV^p 



ersonal 



MY 



(10) 



Compared with equations (5) and (6), it is clear to see that 
our solution is more concise and has a low time complexity 
Q(K), which is only related to the number of hotspots and 
independent of the number of nodes in the network. 
Similarity computation: The relative entropy does not keep 
symmetry, i.e., the relative entropy of Xi over Xj does not 
equal to that of Xj over X^ We here use inverse symmetrized 
entropy to estimate the similarity Sim(i,j) between node i 
and node j, we have 



Sim(i,j) = (Sim(i/j) + Sim(j/i)) 



(11) 



K 



where, Sim(i/j) = £ w l personah log(w l personal ./w: 



K 



I 

personal j 



and Sim(j/i) = £ w nal] \og(w nal Jw l nal ). 
i=i 

Personality computation: Let Peri denote the personality 
of node i, according to the definition of entropy, we have 



Per, 



K 

V 7,/ 

/ . ^personali 
1=1 



log(u>p. 



ersonal. 



(12) 



w 



To make the above equations hardness, we set Wj — S and 
rsonai - = <5 if they equal to zero, where 6 is a constant. 
Hoten metric: The Hoten metric is a value between and 
1 and is calculated by integrating the above three compo- 
nents. Hence, the question of selecting the best relay for 
the message becomes a multi-objective optimization problem. 
This is achieved by linear weighting method. Let BetUtik, 
SimUtili(nd) and PerUtiU denote the betweenness utility, 
similarity utility and personality utility of node i for delivering 
a message to destination node when meeting up node 
j, respectively. Exploiting the normalized relative weights of 
these attributes, we have 



BetUtik 
SimUtili{nd) — 
PerUtiL 



CI 



c i b + cl 



Sim(i, rid) 



Sim{i, nd) + Sim(j, rid) 
Per, 



Peri + Perj 

According to the linear weighting method, we have 



(13) 
(14) 
(15) 



Hoteni(rid) — aBetUtili + f3SimUtili(rid) + jPerUtili 

(16) 

where a, (3, and 7 are system parameters and a + j3 + 7 = 1. 
Hoten routing: We outline the Hoten routing in Algorithm 
1, which presents the communication process between node 
i and node j. Take node i as an example. When meeting up 
node j, for any message m that i carries, if its destination 
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ntd is node j, node i delivers it to node j and removes it 
from i's message queue. Otherwise, if node j does not hold 
this message, the two nodes swap their own Hoten utility. If 
Hoteni(irid) is smaller than Hoten j(rrid), node i delivers the 
message to node j and removes m from its buffer space, i.e., 
Hoten takes a single copy scheme. 



Algorithm 1 Hoten Algorithm, pseudo-code of node i 
1: upon meeting up node j do 
2: for any message to in i's queue do 
3: if rrid == j then 
4: deliverM sg(m) 
5: remove(m) 
6: else if to ^ j then 
7: i <— Hoten j(m,d) 

8: isForwarding(m) {make forwarding decision} 
9: end if 
it): end for 

11: isForwarding(m) 

12: if Hoteni{n%d) < Hoten jirrid) then 

13: forwardingM sg(m) 

14: remove{m) 

15: end if 



V. PERFORMANCE EVALUATION 

We take Epidemic routing as a baseline to compare Hoten 
performance to SimBet metric. 

A. Simulation Setup 

We exploit the aforementioned five real DTNs traces to test 
the premise of routing based on social structures. Since each 
trace has different run times, for the four DTNs traces (KAIST, 
NCSU, New York City and Orlando), we use the minimum 
runtime (15000s) in KAIST as the baseline, for State fair 
traces, the runtime is set to 6000s. Thus, we get 92,32,26,39 
and 19 traces respectively, which are slightly smaller than the 
original numbers (please refer to Table H}. The value of 5 is set 
to 0.000001. The parameters for the Hoten metric in equation 
(16) are all set to 1/3, which assigns an equal importance to 
them. According to Table ITT1 the ratio of k/K is set to 15%. 
The nodal transmission range is set to 250m, a typical value of 
WiFi. In addition, all nodes are both sources and destinations, 
i.e., each node sends a single message for all other nodes. 

B. Performance Criteria 

We evaluate the performances of the three routing protocols 
taking the following criteria into account. 
Cumulative packet delivery ratio (CPDR): This criterion 
represents the delivery performance in the network in terms 
of the number of successfully received messages over that the 
sent messages. We evaluate the delivery performance of the 
three metrics under different message TTLs. 
Mean delivery delay: Although delay is tolerant in DTNs, 
a low end-to-end delay is still desirable as long delay means 
more system resources are occupied for longer. 



Average ratio of infected nodes: We use this criterion to 
quantify the overhead in the network. Since Hoten and SimBet 
only take a single copy scheme, both are expected to perform 
similarly in this respect. 

Average number of hops per message: The least hop does 
not mean the shortest delay in DTNs, since it is measured 
as the successful forwarding times of a message until the 
destination receives it. Whereas, we still try to minimize this 
criterion due to the two aspects of considerations, the channel 
interference and battery power. Minimizing the number of 
hops also reduces the probability of channel interference and 
the consumption of battery power. 

C. Cumulative Packet Delivery Ratio 

Fig. 6 illustrates the performance of packet delivery ratio 
under different message TTLs. Epidemic has the highest 
CPDR than the other two as expected. Compared to SimBet, 
it is clear to see that Hoten improves the packet delivery 
ratio. The reason behind this is that Hoten exploits hotspot 
and entropy to estimate the centrality and similarity of nodes 
and takes nodal personality into account, which make Hoten 
prediction more accurate than that of SimBet. An exception 
happens at State Fair, where the CPDR of SimBet is better 
than that of Hoten, this is mainly because that the adjacent 
matrixes among nodes will quickly become identical in well 
connected scenarios, hence, the heterogeneity of the nodes can 
not be well reflected, which in turn makes SimBet tend to flood 
the messages. 

D. Mean Delivery Delay 

Looking at the mean delivery delay (Fig. 7). Epidemic has 
a better MDD performance than Hoten and SimBet, also as 
expected. Compared to SimBet, Hoten indeed prolongs the 
MDD. Whereas, we notice that Hoten improves the CPDR 
metric in most scenarios (Fig. 6), hence, we conjecture that the 
extra delay may be caused by those messages which could be 
dropped under SimBet, but now are able to be delivered to 
their destinations under Hoten. 

E. Average Ratio of Infected Nodes 

Fig. 8 clarifies that both Hoten and SimBet achieve the better 
performance in terms of average ratio of infected nodes as 
expected. It is obvious to see that Epidemic almost infects 
every nodes in the network. 

F. Average Number of Hops per Message 

Fig. 9 illustrates the average number of hops per message. 
Hoten metric obviously outperforms the other two schemes. 
For example, at KAIST, the average number of hops per 
message achieved by Hoten is near to 7, whereas Epidemic 
and SimBet lead to longer routing paths almost resulting in 
an average hop value of 27 and 41 respectively. Interestingly, 
we find that the average number of hops per messages resulted 
from SimBet is even bigger than that of Epidemic. This outlier 
is, we conjecture, due to the repeated infection caused by 
SimBet, i.e., when node i has delivered message to to node 



s 



j, it deletes m and may receive m again when meeting up 
another node v, which also carries m and has a lower SimBet 
metric than node i, 

VI. Conclusion and Future Work 

In this paper, we present a novel routing metric, called 
Hoten, to route messages in DTNs. We exploit hotspot and 
entropy to design utility function. We first use the relative 
entropy between the public hotspots and the personal hotspots 
to evaluate the centrality of nodes. Then we utilize the inverse 
symmetrized entropy of the personal hotspots of two nodes to 
compute the similarity between them. Third, we exploit the 
entropy of the personal hotspots to estimate node personality. 
Besides, we propose a method to explore the optimized size 
of hotspot. Trace-driven simulation results show that Hoten 
largely outperforms other solutions, especially in terms of 
combined overhead/packet delivery ratio and the average num- 
ber of hops per message. 

One significant topic for future work is to study the in- 
fluence of temporal correlation of stay points on the Hoten 
performance. 
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